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ABSTRACT 

The  design,  implementation,  and  evaluation  of  an 
adaptive  scheduling  algorithm  for  the  M  U  N  I  X  operating  system 
is  reported  here.  MUNIX,  a  multiprocessing  version  of  UNIX, 
was  designed  to  run  on  a  dual  PDF  11/45  multiprocessor 
system.  lopics  covered  include:  a  survey  of  adaptive 
scheduling,  laboratory  equipment  configuration,  scheduling 
with  MUNIX*  benchmark  testing,  and  non-adaptive  scheduling 
changes.  Conclusions  and  suggestions  for  possible 
improvements  are  also  included. 
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I.  INTRODUCTION 

In  the  fall  of  1 9  7  u  ,  the  Computer  Science  Group  at  the 
Naval  Postgraduate  School  acaui  red  a  fairly  large  amount  of 
computer  hardware  and  a  limited  amount  of  software.  The 
intent  of  acquistion  was  to  integrate  the  hardware  and 
software  to  support  signal  processing  research.  The  hardware 
consisted  of  two  PDF  11/5  0  computers  maoe  by  Digital 
Eguipment  toroorat ion»  one  CSP  30  processor  made  by  Computer 
Signal  Processors  Incoro  orated,  and  various  associated 
peripherals  described  in  section  II.  B . 2  and  section  II.  B . 3 
( see  figure  3 ) . 

An  agreement  with  Bell  Laboratories  provided  the 
software  consistinq  of  an  operating  system  called  UNIX  [15]. 
UNIX/  as  delivered/  did  not  have  the  capability  to  fully 
utilize  all  the  system  resources  necessary  to  supnort  signal 
processing.  As  a  result/  several  research  nrojects  were  done 
in  this  area.  MUM  IX  171/  a  multiprocessing  version  of  UNIX/ 
was  one  of  the  projects  done.  Note  that  where  the  word  MUNIX 
is  used  in  this  thesis/  UNIX  may  be  substituted.  The 
changes  made  to  MUNIX  may  easily  he  incorporated  into  UNIX. 

One  of  the  goals  of  the  computer  system  was  to  handle 
r  e  a  1  -  t  i  rr  e  /  timeshared#  and  hatch  processing  I  1 1 '  1  .  It  was 
found  t  ha  t  the  scheduling  a  1  no  r  i  t  hm  used  in  UN  J  X  could  be 
improved   for   the   equin^e^t   configuration   being  used.  An 
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excessive  amount  of  time  was  being  wasted  swapping  users  in 
and  out  of  core.  This  wasted  time  was  a  function  of  both  the 
scheduling  algorithm  and  the  amount  of  available  core- 
Figure  1  shows  the  amount  of  real  time?  in  minutes  and 
seconds?  reguired  to  run  the  same  benchmark  program  (see 
Apenoix  (3)  with  different  amounts  of  core  available. 


core  (K  words) 


1  12 

96 

80 

64 

mi 

4- 


x 

\ 


V 


5?>  time 

3:15   3:30   3 : 4  s   4:0  0   4:15   4:30   4:45    ■(  m  i  n . ) 


Figure  1.  Benchmark  Real  Time  Vs.  Core  Size 


Implementing  a  schedulina  algorithm  that  gave  the 
interactive  user  faster  response  times  and  increased 
throughput  was  desired.  Since  a  member  of  the  Computer 
Science  Group  was  interested  in  adaptive  scheduling/  thesis 
research  was  accomplished  in  this  area  and  reported  here. 


II.  BACKGROUND  INFORMATION 

A.  ADAPTIVE  SCHEDULING  -  A  SURVEY 

1  .  General 

Schedulinq  algorithms  can  be  placed  into  three 
basic  classifications  -  round  robin,  priority,  and  dynamic 
or  adaptive  control.  Algorithms  based  on  round  robin  and 
priority  assignment  techniques  have  a  common  characteristic: 
the  processor  is  switched  from  the  process  currently  being 
serviced  to  a  new  process  at  the  end  of  a  fixed  time  quantum 
or  when  a  new  process  is  of  higher  priority.  This  switchinq 
usually  has  a  significant  overhead  and  reduces  system 
utilization. 

When  an  opera  tin a  system  is  getting  close  to 
saturation,  the  round  robin  schedulinq  aioorithm  often  fails 
to  qive  an  adequate  response  time  to  the  tirne-sharino  user 
(1  b  J  .  With  a  priority  type  algorithm,  processes  are  assioned 
priorities  as  they  are  entered  into  the  system.  The  user 
supplies  the  information  necessary  for  the  operating  system 
to   assign   a   priority   to   thr-   process.   The   information 

supplied  can  consist  of  an  estimate  of  CPU  t i me i  an  estimate 
of   the   amount   of   core   required*   and   the    number    of 
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input/output  devices  required.  This  information  usually  is 
an  estimate  of  the  maximum  time  or  the  maximum  amount  of 
primary  memory  required. 

Adaptive  control  solves  the  oroblems  of  the  round  robin 
and  priority  schedulinq  alqorithms  by  giving  adequate  turn- 
around times  to  all  orocesses  except  those  which  are  run  in 
the  background,  that  i  s»  processes  not  in  contention  for 
immediate  service.  Several  papers  illustratina  adaptive 
control  schedulinq  techniques  are  discussed  below. 

Northouse  and  Fu  [131  develop  a  scheduling  algorithm 
based  on  adaptive  control  and  clustering  techniques. 
Bernstein  and  Sharp  [2]  and  Sharp  and  Roberts  [16]  develop 
an  algorithm  based  on  the  principle,  "don't  do  anything 
unless  you  have  to."  This  avoids  the  system  overhead  of 
process-switching  and  swapping  as  much  as  possible. 

2.     Adaptive-Control  and  Clustering  Technique 

An  adaptive  controller  can  be  referred  to  as  a 
closed-loop  or  a  feedback  type  system  with  the 
characteristics  of  the  controlled  process.  Northouse  and  F u 
(13)  proposed  their  batch  scheduler  as  an  adaptive 
controller  with  thr»e  basic  units:  a  classifier,  a 
performance  evaluatOPr  and  a  distributor. 

I  he  classifier  made  an  "a  priori"  classification  o  1  all 
incoming  jobs  based  on  information  Supplied  by  the   user   on 
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his  job  card.  A  clustering  algorithm  was  used  to  establish 
clusters.  The  parameters  used  by  the  clustering  algorithm 
were : 

1)  CPU  time  used. 

2)  number  of  tape  drives. 

3)  number  of  input  cards. 
U)    programming  language. 

5)  number  of  drum  or  disk  files. 

6)  number  of  output  pages. 

Much  effort  went   into   the   proper   classification   of 

clusters  with  the  following  final  result: 

cluster  I-medium  CPU,  larae  tape  file  jobs 

cluster  1 1  - 1  a  r  g  e  jobs 

cluster  Ill-small  jobs 

cluster  IV- medium  CPU,  small  tape  file  jobs 

The  performance  evaluator  monitored  the  system 
performance  in  specific  areas  and  compared  these  evaluations 
to  desired  responses.  Specific  areas  monitored  included 
central  processor  utilization,  printer  traffic*  drum  and/or 
disk  traffic,  and  tape  drive  utilization.  If  efficiency  in  a 
monitored  area  drooped  below  a  minimum  acceptance  level  a 
change  in  the  job  stream  was  made. 

A  performance  index  was  calculated  and  attempts  were 
made  to  optimize  this  index  for  the  next  sub  interval 
(variable  lengths).  The  job  stream  was  then  determined 
usinq  this  index  and  a  linear  prooramming  technigue.  The 
distributor  implemented  the  jnh  stream  that  was  calculated 
by  the  performance  evaluator. 

As   jobs   were   executed*  their  statistics  were  used  bv 
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two  more  components  called  the  data  collector  and  the  data 
base  updater.  The  updater  made  the  system  a  closed  loop 
system  by  continually  updating  the  data  base  from  which  the 
linear  programming  function  made  its  decisions. 

Northouse  and  F  u ,   after   running   two   simulations   on 
their  scheouler,  concluded: 


1)  The  scheduler  was  able   to   adapt   to   changing 
work  1 oads . 


2 )  The  job  stream  had  definite  clusters 


3)  The    orogrammina   language   was   an   important 
parameter  for  classifying  clusters. 


4)  The   distributor   reauired  few  calculations  and 
was  easily  updated. 


5)  Selected   clusters   could  be  forced  throuah  the 
system  reducing  their  turnaround  time. 


6)  Usino    a    orooer    selection    policy   had   a 
significant  impact  on  decreasino  turnaround  time. 


3.  Adaptive  Policy  Driven  Scheduler 

A  "Policy-Driven  Scheduler"  attempts  to  deliver 
computational  resources  at  a  rate  determined  by  some  t ynp  of 
criterion  or  policy  function,  Berstein  and  Sharp  [  d]  define 
their  policy  functions  in  terms  of  "resource  units"  and  "age 
of  interaction."  An  interaction  consists  of:  a  request  from 
a   user  to  the  system,  some  system  service,  and  a  reply  from 
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the  system  back  to  the  user.  An  edit  command  on  a  time- 
sharing system  is  an  example  of  an  interaction.  The 
execution  of  a  batch  job  is  another  example. 

The  age  of  the  interaction  at  some  time  "t"  is  the 
elapsed  time  from  when  the  user  made  the  request  to  the 
system  until  time  "t".  Resource  units  are  a  measure  of 
service  received  by  a  particular  interaction.  In  i  is  an 
arbitrary  non-negative  time  cost  for  the  i-th  resource  and 
Rij(t)  is  the  number  of  units  of  the  i-th  resource  used  by 
the  j-th  interaction  at  aqe  "t".  The  total  resource  units 
of  the  j-th  interaction  at  aqe  " t " »  R  j  (  t  )  r  is  equal  to  the 
sum  of  all  the  W  i  times  Rij(t). 


1 
tc  t  1 

AGL    OF     INTERACTION 

Mqure    ? .    Resource    Count     and    Policy    Function    vs    Anc 
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The  resource  count  is  a  nondecr easing  function  of  time. 
Fiaure  2  shows  the  resource  count  function  for  the  j-th 
interaction  up  to  aqe  t 1 .  A  slope  of  zero  indicates  periods 
of  no  service  while  a  oositive  slope  indicates  periods  of 
resource  consumption.  The  policy  function/  F  (  t  )  »  is  shown  as 
a  curve  . 

The  q  o  a  1  of  the  scheduling  algorithm  is  to  keep  the 
terminal  point  of  each  interaction  above  the  policy 
function.  The  total  amount  of  resources  required  to  complete 
an  interaction  is  not  usually  known  in  advance.  Thus/  the 
algorithm  tries  to  maintain  Rij(t)  greater  than  or  egual  to 
F(t). 

An  interaction  is  critical  at  time  " t "  if  Rj(t)  is  less 
than  F  ( t  )  .  Interactions  are  ordered  according  to  a  measure 
called  "critical  time."  Critical  time  is  defined  as: 

tO  +  tc  -  t 1 , 
where  tO  is  the  current  time  and  t  1  is  the  current  ane  of 
the  interaction.  tc  is  the  last  age  of  the  interaction  at 
which  time  it  went  critical.  Fiaure  ?-  shows  an  interaction 
of  age  tl  which  became  critical  at  age  tc  and  which  is  still 
critical . 

Critical  time  changes  only  when  service  is  receive d  and 
thus  only  neerts  to  be  undated  at  that  t  i  m  p  .  This  property 
insures  that  the  que up s  nf  processes  ordpred  by  this 
quantity  remain  order po  as  time  pro presses.  After  a  process 
receives  service  it  must  be  relocated  in  the  queues. 
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The  scheduler  has  two  queues.  The  "core  queue"  is  a 
linked  list  of  processes  in  main  memory  and  the  "drum  queue" 
is  a  linked  list  of  processes  not  in  main  memory.  Both 
queues  are  ordered  by  critical  tirre.  Processes  from  the 
head  of  the  "drum  queue"  are  transferred  into  main  memory  if 
room  is  available.  If  room  is  not  available  for  a  process^ 
a  swapping  decision  is  made  based  on  a  comparison  of  the 
critical  times  of  the  first  process  on  the  drum  queue  and 
the  last  orocess  on  the  core  queue. 

This  scheduler  reduces  overhead  caused  by  unnecessary 
swapping  by  prohibiting  the  replacement  of  a  Drocess  in  core 
by  a  noncritical  process  which  is  not  in  core.  The  rules 
that  make  up  the  swapping  decision  are: 


1)  A   critical   process   in   core  is  not  eligible  to  be 
swapped  out  (designed  to  prevent  thrashing). 


2.)  Processes  which  are  inactive  because  they  are 
awaiting  communication  from  a  terminal  are  given  a 
critical  time  of  t(e). 


3)  A  noncritical  process  which  is  not  in  main  memory 
will  be  swapped  in  if  room  exists  or  if  room  can  be 
created  by  swapping  out  Processes  with  a  critical  time 
of  t (e) . 


4 )  A  critical  process  which  is  not  in  main  memory   will 
displace  a  noncritical  process  which  is  in  main  memory. 


Bernstein  and  Sharp  were  unable  to   detect   thrashing   using 
these  constraints. 

The  most  critical  process  which  is  in  main   memory   and 
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ready  to  execute  is  aiven  the  processor.  The  period  of  use 
is  one  time  quantum  or  until  the  process  voluntarily 
relenquishes  it*  which  ever  occurs  first.  The  processor  is 
again  dispatched  after  a  swapping  decision  is  considered. 

The  policy  function  controls  the  service  received  by 
each  interaction.  Static  policy  functions  must  be  set 
conservatively  to  avoid  response  problems  during  heavy 
loads?  however,  it  was  found  that  during  light  load  periods 
these  conservative  settings  resulted  in  a  wide  range  of 
service  rates  to  similar  jobs.  Sharp  and  Roberts  [16]  found 
that  varying  the  policy  functions  as  the  job  load  changed 
greatly  reduced  the  service  variance. 

Bernstein  and  Sharp  showed  that  their  policy  driven 
scheduler  was  far  better  than  a  round  robin  scheduler  in 
terms  of  "internal  response  time"  measured  in  seconds: 


SCHEDULER 

Round  Robin 
Policy-Driven 


MINIMUM     MEDIAN     MAXIMUM 


4.5 
0.5 


10.8 
1.7 


102. a 

3.8 


Table  I.  Round  Robin  Vs  Policy-Driven  Scheduler 


Sharp  and  Roberts  demonstrated  that  their  adaptive  policy 
driven  scheduler  was  far  better  than  the  static  policy 
driven  scheduler  with  the  folio  w  i no  results: 
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SCHEDULER 

Non- Adapt  i  ve 
Adapt  i  ve 


RESPONSE 
TIME 

5. 1  sec  . 
1 . 7  sec  . 


CPU 
UTILIZATION 

61% 
66% 


DISC 

UTILIZATION 

not 

44% 


Table  II.  Non-Adaptive  Vs  Adaptive  Scheduler 


Note  the  differences  in  the  maqnitude  of  the  response   times 
in  both  comparisons. 

An  adaptive  policy  driven  scheduler  as  it  pertains  to 
the  MUNIX  operatinq  system  for  the  PDP-11/50  will  be 
discussed  in  detail  in  Chapter  III. 

B.  LABORATORY  EQUIPMENT  CONFIGURATION 

1 .  Genera  1 

Although  MUNIX  is  a  multiprocessor  operatinq 
system,  all  testing  was  done  with  only  one  processor  active. 
This  was  done  to  control  testing.  Documentation  of  the 
laboratory  eauipment  configuration  is  necessary  because  test 
results  depend  on  which  of  the  two  systems  is  used.  Figure 
3  shows  the  laboratory  eauioment  and  configuration.  During 
the  design  and  implementation  of  the  adaptive  scheduler,  the 
operational  equipment  consisted  of  two  P  D  P  11/50  CPU's 
(labeled  A  and  B)  with  the  following  equipment: 
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<?.  System  A 


3  2  K  MOS  memory  ('4 SO  nsec.  access  time) 

16K  core  memory  (750  nsec.  access  time) 

16K  C5PI  memory  (900  nsec.  access  time) 

1  DEC  LA30-C  terminal 

1  disk  cartridqe  (  R K 0 5  equivalent  ) 

1  Versatec  printer/plotter 

1  paper  tape  reader/punch 

1  Vector  General  3  D  3  I  vector  display  terminal 

1  Ramtek  rastpr  scan  color  display  unit 

1  Tektronix  '4  0  1  '4  oisplav  terminal 

1  Hughes  Conoqraphic  console 

1  Data  Tablet 

1  EPC  graphic  recorder 


3.  System  B 


96K  CM  I  core  (850  nsec.  access  time) 

16K  CSPI  core  (9  00  nsec.  access  time) 

1  DEC  LA30-C  terminal 

d    DECpack  disk  cartridges  (RK05) 

1  DEC  D H 1 1 - A C  communications  multiplexor  connected 

to  (up  to  16)  remote  terminals 
1  card  reader  (600  cards/m in) 
1  impact  printer  (400  lines/min) 
d    nine  track  magnetic  tape  drives 
1  seven  track  magnetic  tape  drive 


4.  System  A  and  B  Differences 

The  most  important  difference  between  t+i e  two 
systems  is  the  amount  of  memory  available.  System  B  has  more 
memory  than  system  A  and  therefore  can  have  more  processes 
in  resident  core.  This  has  a  significant  effect  on  any 
benchmark  testing  (  see  section  I  I . D . )  .  In  addition,  the 
speed  of  the  memory  must  also  be  considered. 
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C.  SCHEDULING  WITH  MUNIX  ON  THE  PDP-11/50 

MUN1X  [7]  is  a  multiprocessing  version  of  UNIX  [15],  a 
general  purpose,  multiuser,  interactive  operating  system 
usable  on  the  Digital  Equipment  Corporation  PDP-11/40, 
PDP-ll/4b,  and  PDP-11/50  computers.  UNIX  was  developea  by 
Be  1  1  Labora t  or  i  es . 

In  order  to  understand  the  MUM  IX  scheduling  algorithm 
and  implement  a  new  one  it  was  necessary  to  learn  C,  a  high 
level  programming  language.  Several  references  were  helpful 
in  this  endeavor  [1,10,181. 

Although  parts  of  the  MUNIX  operating  system  have  been 
documented  [7,12],  Appendix  A  will  attempt  to  completely 
document  the  portions  related  to  scheduling. 

D.  BENCHMARK  TESTING 

Benchmark  test  inn  is  often  used  to  evaluate  and  compare 
the  performance  of  one  computer  system  relative  to  another. 
A  benchmark  program  was  written  to  test  scheduling 
algorithms.  It  was  necessary  to  use  processes  thnt 
accomplished  in nut,  out  Put,  computations,  and  compilations. 
A  discussion  of  the  benchmark  prooram  used  may  be  found  in 
Appendix  B. 
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E.  NON-ADAPTIVE  SCHEDULING  CHANGES 

1 .  Gene  ra 1 

After  studying  the  scheduling  algorithm  used  by 
MUNIX,  it  was  Decided  to  make  some  non-adaptive  changes 
before  implementing  an  adaptive  scheduler.  Thrpe  areas  were 
modified  to  make  the  algorithm  more  efficient  and  have  a 
better  basis  to  start  performance  evaluations  on  the 
adaptive  scheduler.  The  changes  are  documented  in  detail  in 
Appendix  C  and  summarized  here. 

2.  Maximum  Number  of  Processes  (NPROC) 

a .  C  hange 

NPPOC  was  a  static  upper  limit  for  the  number 
of  processes  in  the  process  table.  The  static  upper  limit 
was  replaced  by  a  dynamic  o  n  e  »  thus  saving  process  table 
search  time. 

b.  Evaluation 

The  benchmark  prooram  (sep  Appendix  B)  was 
run  against  the  scheduling  alqoMthm  nefore  and  after  this 
change.   Four  runs  were  made,  two  with  a  drum  beinq  used  for 
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/ T M P  files  (temporary  files)/  and  two  without.  The  results 
are  listed  in  tables  III  and  IV.  Real »  user,  and  system 
times  are  shown  in  minutes  and  seconds.  Appendix  B  explains 
how  the  system  calculates  these  times  and  estimates  their 
accuracy.  This  testing  was  accomplished  on  the  "B"  system 
(see  section  II. B. 3.). 


BEFORE  CHANGES 

real  6:02 
use  r  2:12 
sys     :42 


AFTER  CHANGES 

real  6:00 
user  2:00 
sys     :41 


Table  III.  NPROC  Change  Evaluation  with  No  Drum 


Before  Changes 

real  H: 49 

use  r  1 : bB 
sys     :4  1 


After  Changes 

real  4:38 
user  1:48 
sys     : 42 


Table  IV.  NPROC  Change  Evaluation  with  Drum 


3.  Looping  in  Function  Sched 


a  .  Change 


As   described   in  section  M.l.P  and  B.l.c.(S) 

of  Ap  peri  Hix  A,  two  loops   in   sched   were   shortened   so   no 
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unnecessary  code  was  executed. 

b.  Evaluation 

The  benchmark  program  (see  appendix  B)  was 
run  before  and  after  the  channes.  Several  runs  were  made 
because  the  statistics  showed  no  significant  savings  (see 
Table  V).  Testing  was  accomplished  on  the  "A"  system. 


Before  Changes 

real   7:08 
user    :  lib  .6 
sys     :18.0 


After  Changes 

real   7:08 
user    : 46 .6 
sys     : 1 7.8 


Table  V.  Looping  Change  Evaluation 

^,  Size  Check 

a .  Change 

Function  sched  was  changed  to  make  an 
additional  check  before  swapping  a  process  out  of  core.  The 
size  of  the  incoming  process  had  to  be  smaller  than  or  equal 
to  the  size  of  the  outgoing  process.  If  the  incoming 
process  is  larger  than  oil  processes  eligible  for  swap pin a, 
no  size  check  is  made.  This  task  is  accomplished  usinn  two 
passes,  bee  Appendix  C  for  a  detailed  explanation. 
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b .  Evaluation 

Several  runs  were  made  with  the  benchmark 
program  on  the  "A"  system  because  of  the  2b  percent  savinas 
realized  in  real  time  (see  Table  VI).  This  change  was  also 
tested  on  the  "B"  system  but  a  savinqs  of  only  6  percent 
was  found  there.  The  difference  in  savings  is  explained  by 
the  significant  difference  in  available  memory  (the  "B" 
system  has  three  times  as  much  user  space)  for  each  system. 


BEFORE  CHANGES 

real  7:08 

user  :  Lib  .  6 

sys  :  18.0 


AFTER  CHANGES 

real   5:18 
user    : 45.3 
sys     :  1 7.9 


Table  VI.  Size  Check  Chanoe  Evaluation 
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III.  ADAPTIVE  SCHEDULER 
A.  DESIGN 

The  adaptive  scheduler  described  in  section  II. A, 5 
was  implemented  with  a  few  minor  chanqes, 

1  .  Goa 1 s 

There   were   two   qoals   that   the    adaptive 
scheduler  attempted  meet. 

a.  Improve  system  throughput  by  reducing  the  amount  of 
process  swaopina  (in  and  out  of  core). 

b.  Give  the  interactive  user  better  response  time.  The 
ML! NIX  scheduler  basically  gave  users  a  round  robin  type  of 
service. 

d.     Desiqri  Chanaes 


Sham  and  Roberts  [16]  measurea  the 
criticallity  of  a  process  as  the  time  from  whprp  the  process 
last  went  critical  until  the  current  time  (see  f inure  i ) . 
The   implemented  scheduler  measures  the  critical  time  as  thr> 


vertical  distance  from  the  policy  function  to  the  resource 
count.  This  design  change  was  made  to  facilitate  a  more 
efficient  calculation  of  priorities.  Sharp  and  Roberts 
calculated  priorities  on  a  fixed  period  basis.  In  the 
current  scheme  priorities  are  calculated  whenever  they  are 
needed.  There  are  two  reasons  for  this  change: 

1)  The  policy  function  is  chanqed  whenever  the  job 
(process)  load  reaches  predetermined  amounts.  When  the 
function  is  changed/  all  jobsr  both  in  and  out  of  core*  have 
to  have  their  priorities  recalculated  with  the  new  policy 
f unc  t  i  on . 

2)  Depending  on  the  fixed  time  d  e  r  i  o  d  ,  jobs  may  receive 
an  excessive  or  insufficient  amount  of  resources. 

By  recalculating  the  priorities  on  a  continuing  basis? 
no  special  software  is  needed  to  recalculate  the  priorities 
after  policy  functions  have  been  changed.  Also*  everytime  a 
scheduling  decision  is  made?  it  is  made  with  the  latest 
priorities  of  all  the  jobs  concerned. 

B.  IMPLEMENTATION 

1.  Process  Table  Control  Variables 

a.  pf't  jtc  -   w.is   changed   from   a   character- 
variable   (maximum   value   of   1  ?  7  )   to   an  integer  variable 
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(maximum  value  of  32767),  The  use  of  the  variable  was  also 
changed  (see  Apoendix  A  section  A.l.f).  Currently  it  is  used 
as  a  counter  for  the  total  number  of  seconds  since  a  process 
(job)  has  last  had  any  teletype  input.  It  is  also  used  tc 
calculate  the  priority  of  the  process. 

b.  p «-  f  1  a  g  -  was  chanqed  from  a  character 
variable  to  an  integer  variable  because  two  additional  bits 
were  needed  as  special  indicators. 

1)  P S T M  -  (value  of  400  octal)  when  set 
means  that  the  process  has  received  a  minimum  amount  of 
resources  in  a  minimum  amount  of  time  and  from  this  point  on 
will  be  run  strictly  as  a  background  process. 

2)  TP1JATT  -  (value  of  1000  octal)  when 
set  means  that  this  process  is  waiting  for  terminal  input 
and  will  not  be  scheduled  to  run  again  until  terminal  input 
has  been  made . 

c.  p  <-  r  e  s  r  -  was  added  as  an  inteqer  variable. 
It  is  used  to  keen  track  of  the  amount  of  resources  received 
by  the  individual  process.  The  LI -vector  (see  Apoendix  A 
section  A„2)  of  each  process  has  two  variables*  u«-utime  and 
u  *-  s  t  i  m  e »  that  contain  the  same  information.  These  variables 
could  not  be  used  for  two  reasons: 

1)  The  U-vector  is  in  core  only  when  the 
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process  is  core  resident.  Priorities  must  be  calculated 
both  when  the  Drocess  is  in  and  out  of  core. 

2)  Because  of  the  inter-dependency  of 
processes  [151  (paqe  370),  the  parent  (or  grandparent) 
process  accumulates  the  resource  units  (  u  *-  u  t  i  m  e  and  u  <-  s  t  i  m  e  ) 
of  its  chi Idem  (or  grandchildern). 

This  new  variable  only  accumulates  resource  units  for  the 
process  it  is  related  to,  p  <-  r  e  s  r  is  incremented  in  program 
clock. c  in  the  same  places  as  u  *-  u  t  i  m  e  and  u  <-  s  t  i  m  e  . 

i. ,  Priority  Calculations 

Subroutine  "schpri.c",  schedule  priority,  was 
written  to  calculate  process  priorities  (see  Appendix  D ) . 
There  were  two  possible  areas  of  the  operatinq  system  that 
the  priorities  could  be  calculated. 

a.  Subroutine  swtch,  in  program  slp.c,  is 
entered  each  time  the  operatinq  system  chanqes  (switches) 
from  one  process  to  another. 

b.  Subroutine  sched,  also  in  program  slp.c» 
is  entered  each  time  a  process  is  hpinq  considered  for 
s  w  a  p  p  i nq, 

A   test   was   run   to   examine   the   number  of  times  the  two 
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subroutines  are  executed  relative  to  each  other.  It  was 
found  that  swtch  is  executed  approximately  thirteen  times  as 
often  as  sched.  Sched  also  decides  which  process  should  he 
core  resident  while  switch  decides  which  process  should  he 
executed  next.  As  a  result  of  these  two  observations,  sched 
was  chosen  as  the  place  to  calculate  priorities. 

3.  Scheduling  Algorithm 

Subroutine  sched  (see  Appendix  A  section  b.l) 
is  described  with  adaptive  changes.  This  subroutine  is  a 
process  with  an  infinite  1  o  o  o  initiateo  from  function  main. 
Schea  is  executeo,  out  to  sleep,  awakened,  and  executed 
again.  Sched' s  main  job  is  to  swao  processes  in  and  out  of 
core.  It  accomplishes  this  task  using  the  following 
a  1 gori  t  hm : 

a.  Set  the  first/second  pass  indicator 
(fspass)  to  a  value  of  first  pass. 

b.  If  there  are  any  processes  in  the  swao 
file,  find  the  one  that  is  the  most  critical  and  try  to 
transfer  it  into  core.  If  the  swap  file  is  empty,  then  ao 
to  sleep  on  the  RUNOUT  flag  which  indicates  there  are  no 
processes  in  the  swao  file.  When  awakened  q  o  to  " b • "  and 
continue.  Sched  may  try  up  to  a  maximum  of  three  different 
ways  to  bring  this  Process  into  core. 
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c.  If  room  exists  in  core  then  transfer  the 
process  into  core. 

ci ,  If  room  does  not  exists  in  cere  then  sched 
looks  for  what  it  calls  "easy  core".  Easy  core  is  core  that 
belongs  to  a  process  that  is  not  a  system  process,-  not 
locked/  and  waiting  for  some  type  of  input  or  output.  An 
additional  constraint  that  applies  only  on  the  first  pass  is 
that  the  size  of  core  needed  for  the  incoming  process  is 
less  than  or  equal  to  the  size  of  core  of  the  outgoing 
process.  If  easy  core  is  found/  then  the  outgoing  process  is 
swapped  out  ana  sched  goes  to  "c."  to  continue.  Sched 
repeats  this  until  either  the  process  can  be  transferred  in( 
or  no  easy  core  is  available. 

e.  If  no  easy  core  is  available/  then  sched 
searches  all  the  processes  that  are  in  core  for  the  one  that 
has  the  lowest  priority  and  meets  the  following  constraints: 
the  process  must  not  be  a  system  process/  not  locked/ 
sleepinQ/  and  not  currently  being  run  on  the  other 
processor.  Two  additional  constraints/  that  are  made  on  the 
first  pass  only/  are  the  size  check  mentioned  in  "d."  above 
and  a  check  that  the  process  can  be  ready  to  run  instead  of 
sleepina.  At  this  point  there  are  two  possible  states  to 
consider. 

1)  The  lowest  priority  process   eligible 
to   be   swapped  out  is  critical.   If  this  is  the  first  p  a  s  s » 
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change  the  f i rst /second  pass  indicator  to  second  pass  and  go 
to  "d."  i  or  if  this  is  the  second  pass  and  no  processes  are 
eligible  to  be  transferred  out?  go  to  sleep  on  the  RUN  IN 
flag  (processes  are  in  the  s  w  a  d  file).  When  awakened  go  to 
"a."  and  continue. 

2)  If    the   lowest   priority   process 

eligible  to  oe  swapped  out  is  not  critical  or  this  is  the 
second  pass  and  a  process  meets  the  requi  rements  in  "e," 
above/  then  swap  the  indicated  process  out  of  core  and  go  to 
"c."  and  continue. 

Two  additional  versions  of   the   above   algorithm   were 
tested/  and  will  be  discussed  in  section  IV. A. 
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IV.  CONCLUSIONS  AND  RECOMMENDATIONS 

A.  CONCLUSIONS 

1.  Critical  Processes 

a .  Change 

Sharp  and  Roberts'  algorithm  [11]  did 
not  allow  any  process  which  was  critical  to  be  swapped  out 
of  core.  That  same  constraint  was  implemented  by  changing 
B  .  3  .  e  .  1  and  B . 3 . e . 2  in  section  III  as  follows: 

B.3.e.l  The  lowest  priority  process  eligible  to  be 
swapped  out  is  critical.  If  this  is  the  first  pass,  change 
the  first/second  pass  indicator  to  second  pass  and  go  to 
" d . " i  or  if  this  is  the  second  pass,  go  to  sleep  on  the 
RUNIN  flag  and  when  awakened  go  to  "a."  to  continue. 

B . 3 . e . 2  If  the  lowest  priority  process  eligible  to  be 
swapped  out  is  not  critical/  then  swap  the  process   out   and 

continue  with  "c."  above. 

This  change  was  implemented  by  only  searching  for   processes 
with  a  priority  greater  than  ?  e  r  o  . 
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b .  Evaluation 

Because  of  the  inter-relationship 
between  processes/  this  change  was  able  to  lock  all  of  core 
and  put  the  operating  system  into  a  deadlock  condition.  For 
example?  consider  the  case  where  both  the  parent  and  the 
child  processes  are  critical  and  the  parent  is  waiting  for 
the  child's  termination.   The  child  cannot  terminate  because 

it   is   not   in   core   and   cannot  oet  into  core  because  the 
parent  is  critical  (and  cannot  be  swapped  out). 

2 .  Non-Critical  Processes 

a .  Change 

It  was  thought*  that  by  not  trying  to 
transfer  a  non-critical  process  into  core  (by  swapping 
another  out) r  swapping  time  could  be  saved.  This  change  was 
implemented  by  adding  the  following  after  III. 6. 3. e.  above. 

If  the  incominq  process  is  not  critical  then  go  to 
sleep  on  the  RUN  IN  flag.  lAlhen  awakened  go  to  "a."  above  to 
continue, 

b.  Evaluation 

Although  this  change  did  not  cause  a 
deadlock,  it  did  create  an  unsatisfactory   result.   Lxample: 
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The  parent  is  out  of  core  and  has  two  childern  in  core.  The 
first  child  is  a  compute  bound  job  and  has  received  a  lot  of 
resource  units.  The  parent  becomes  critical  and  forces  the 
non-critical  child  (compute  bound)  out.  The  second  child 
dies,  the  parent  is  now  waiting  on  the  first  child,  but  he 
cannot  net  back  into  core  until  he  is  critical.  So  the 
computer  has  nothing  to  do  until  the  child  is  able  to  get 
back  into  core  by  forcing  the  parent  out  and  back  into  a  new 
1 ocat  i  on . 

3.  Implemented  Algorithm 
a .  Change 


All  the  changes  are  explained  in  section 


III.B.3. 


b .  Eva  1 uat  i  on 

The  benchmark  program  was  run  several 
times  on  the  A  processor  with  the  results  listed  in  Table 
VII. 


BEFORE  CHANGES 

real       b :  1  8 
.;;-,    .••Jus.6.r;.  '..  :-^5.  3  . . 
''''        s'y-s' •'■''■   'ii'V;Q  "■ 


AFTER    CHANGES 

real        8:33 
-..'  .U-s.er.  ■.    >  .r-M^.S 
""  '  sy's  :'  •'"    'rT8VB 


...:;•.■•{.■..:■  ■■  '■"■•',  ■•». 


Table  VII.  Implemented  Algorithm  Evaluation 
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It  is  obvious  that  the  real  time  is  slower.  This  can  be 
explained  in  part  by  the  amount  of  time  spent  swapping 
processes.  The  M  U  N I X  scheduler  solves  this  problem  by 
requiring  that  a  process  stay  in  core  at  least  three  seconds 
and  once  out,  stay  out  at  least  two  seconds.  A  similar 
effect  could  be  accomplished  with  the  adaptive  scheduler  by 
changing  the  process'  p «- 1  i  m  e  .  This  was  decided  against 
because  the  adaptive  scheduler  would  then  be  a  modified  (and 
probably  less  efficient)  M U N I X  scheduler. 

4  .  Goa 1 s 

The  goals  of  this  thesis  were  not  met  by  the 
adaptive  scheduler.  However;  they  were  met  in  part  (non- 
adaptive  schedulinq  chanqes  -  section  II. E.)  by  the  research 
accomplished  while  implementing  the  scheduler. 

a.  Svstem  thru-put  was  improved  by  reducina 
the  amount  of  process  swappino  (see  section  II. E • 4 , ) . 

b.  The  interactive  user  is  not  given  a  better 
response  time,  but  all  users  are ,  This  was  accomplished  by 
improving  the  efficiency  of  the  current  scheduling 
alaori t  h  m . 
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B.  RECOMMENDATIONS 

1.  Adaptive  Control 

a.  MUNIX 

The  adaptive  control  scheduling 
algorithm  conceotf  when  applied  to  MUNIX  or  UNIX  will  need 
more  careful  consideration,  UNIX  uses  a  hierarchical  process 
structure  which  creates  process  inter- dependency  problems. 
An  example  of  the  problem  can  be  found  in  the  "time  sh 
benchmark"  command  sequence  (Appendix  B).  The  "time"  command 
is  the  parent  to  the  "sh"  command/  which  is  the  parent  to 
the  "benchmark"  command  file.  The  "benchmark"  command  file 
will  go  two  aaditional  generations  lower  in  all  "C"  compile 
commands.  It  is  entirely  possible  to  be  eight  or  nine 
generations  deep  without  executing  any  involved  command 
sequences.  Thus?  it  is  a  frequent  occurrence  that  the 
currently  active  child  will  have  numerous  (intentionally) 
waiting  ancestors  which  have  no  computational  requirement 
until  the  child  terminates.  The  failure  of  the  present 
adaptive  control  effort  seems  to  stem  largely  from  the  fact 
that  each  process  was  not  considered  on  its  own  merit. 

nlhen  a  parent  is  waiting  for  a  child  to  terminate*  the 
parent  should  not  be  in  competition  for  resource  units  with 
the  child.  There  are  two  possible  solutions  to  this  problem: 
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1)  Any  process  that  is  in  the  wait  state 
should  not  have  its  "p«-time"  increased.  This  would  not  allow 
a  process  to  chanqe  priority  while  it  is  waitinq  on  another 
process.  This  chanqe  would  require  a  "status"  check  whore 
"p«-time"  is  incremented. 

2)  If  the  parent  process  is  waitinq  on  a 
child*  set  a  status  f lao  so  that  the  parent  will  not  be  out 
in  contention  with  its  non-terminated  child.  This  chanqe 
would  be  considered  the  general  case*  but  it  would  require 
more  software  chanqes. 

b.  Other  Operatino  Systems 

Implementing  an  adaptive  control 
scheduling  algorithm  with  a  minimal  hierarchical  structure 
seems  to  be  straight  forward.  Sharp  and  Roberts  [16J 
reported  no  serious  oroblems  with  their  implementation. 

d .  Additional  Research 

Additional  research  should  be  undertaken  to 
analyze  process  interactions  in  UNIX.  To  do  this?  a 
comprehensive  systems  instrumentation  package  must  be 
developed.  In  particular,  a  b e t t  e r  t  i  m i  m g  mechanism,  a 
"complete"  resource  utilization  accounting  system,  and  a 
selective  event  tracing  capability  are  needed.  In   light   of 
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the  level  of  improvement  Sham  and  Roberts  [16]  report/ 
additional  research  with  adaptive  control  and  MUNIX  appears 
warranted. 
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APPENDIX  A:  PROCESSES  AMD  SCHEDULING 

A.  PROCESS  INFORMATION 

Any  time  the  word  "process"  is  used  in  this 
appendix,  the  word  "interaction"  from  section  II  may  be 
subst  i  t  ut  ed . 

1.  Process  Table   (  oroc.h  ) 

This  table  contains  the  control  information 
for  process  scheduling.  Currently,  the  table  may  contain  up 
to  fifty  active  processes,  each  occupying,  a  process  block 
with  thirteen  data  elements  describing  it.  A  process  is 
assigned  a  process  block  when  it  is  created  and  relinguishes 
it  when  it  is  deactivated.  The  elements  and  their  meanings 
are : 


a.  p  <-  s  t  a  t  -  a    process    scheduling   status   with   the 
following  possible  states: 


(1)  SSLEEP  -  this   process   has   been  put  to  sleep 
(not  available  to  run). 


lei)  Sv'/aIT  -  this   process  is  waiting  for  some  typo 
of  input /out  put  completion. 


{$)     SRUN  -  this  process  is  ready  to  run. 
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(4)  SIDL  -  this  process  is  active  but  not   in   any 
other  status. 


(5)  S Z 0 M B  -  this  process  has  terminated  but 
information  in  the  process  control  block  is 
required  for  other  uses. 


b.  p*-flag  -  process   status   of  the  memory  manager  with 
the  followinq  possible  states: 


(1)  SLOAD  -  this  process  is  loaded  in  main  memory. 


(2)  SSYS  -  this  process  is  a  system  process. 


(3)  SLOCK  -  this  process  should  not  be  swapped  out 
of  main  memory  and  is  therefore  locked  in. 


(4)  S S W A P  -  this   process   is  being  swapped  out  of 
core . 


(5)  SMDF  -  this   process  must  run  on  the  first  (B) 
processor . 

(6)  S  M  D  S  -  this  process'  must  run  on  the  second  (A) 
processor. 

(7)  SANYP  -  flag  used  for  processor  masking. 


(8)  SBRKPT 

tool. 


system  break  point*  used  as   a   debug 


(9)  S  GOING  -  this  process  is  currently   beinq   run 
by  some  processor. 


c.  p«-ori  -  process  priority.  1  h  o  priorities   are   whole- 
numbers  ami  ranoe  from  -  1 2 H     (highest)  to  1 i.  I     tl O  w  e  S  t ) . 


d.  p «-  s  i  g  -  nrocess  signal  indicator. 
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e.  p<-uid  -  the   unique   id   assigned   to   the  user  that 
created  this  process. 


f.  p«-time  -  the  total  time  in  seconds  that  this  process 
has  been  in  or  out  of  core. 


g.  p«-t  t  vp 
process  . 


-  the  id  of  the  terminal  associated  with  this 


h .  p<-pid  -  the  unioue  id  assianed  to  this  process. 

i.  p  «-  p  d  i  d  -  the  unique  id  assigned   to   the   parent   of 
this  process. 


j.  p<-addr  -  the  address  (memory  or  disk)  of   the   first 
word  of  the  process'  "u  vector"  (described  below). 


k  .  p  <-  s  i  z  e  -  the   amount   of   non-shareable   core 
process  needs . 


this 


1  ,  p  <-  w  c  h  a  n  •■  holds  a  number  that  can  be  a  channel 
address  or  a  special  indicator.  The  process  is  put  to 
sleep  or  suspended  with  this  number  and  it  can  only  be 
awakened  or  restarted  using  the  same  number. 


m.  *p«-text  -  pointer  to   the   shareable   portion   of   a 
process,  if  it  exists. 


£?.  U  Vector   (  user.h  ) 

The  system  associates  1  0  ?  <4  bytes  of  storacie 
with  each  user  process,  called  the  "u  vector".  This  storage 
contains  system  per-process  data  and  the  system  st^ck  for 
this  process.  An  important  difference  between  u S e r . h  and 
proc.h   i  Sf   oroc.h   is  always  core  resident  while  user.h  is 
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core  resident  only  when  the  process  it  is  associated  with  is 
core  resident.  The  MUNI  X  scheduling  algorithm  does  not  use 
any  elements  from  the  u  vector  to  make  decisions. 

B.  SYSTEM  FUNCTIONS  PERTINENT  TO  SCHEDULING 

The  scheduling  flow   and   basic   system   flow   are 
shown  in  Figure  4  17], 

1  .  sched 

This  function  is  a  process  with  an  infinite 
loop  initiated  from  function  main.  Sched  is  executed;  out  to 
sleep  (see  function  sleep  below),  awakened  (see  function 
wakeup  below),  and  executed  again.  Sched's  main  job  is  to 
swap  processes  in  and  out  of  core.  It  accomplishes  this 
task  using  the  following  algorithm:  If  there  are  anv 
processes  in  the  swao  file  (out  of  core),  find  the  one  that 
has  been  there  the  longest  and  try  to  transfer  it  into  core. 
If  the  swao  file  is  empty,  then  go  to  sleep  on  the  RUNOUT 
flag  which  indicates  there  are  no  processes  on  the  swap 
file.  l\ihen  the  longest  waiting  process  has  been  found,  sched 
tries  up  to  three  different  ways  to  transfer  it  into  core. 
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FUNCT ION 
SWICH 


FUNCTION 
SLEtP 


FUNC1 ION 
SCHED 


USER 
PROCESS 


¥ 


FUNCTIONS 
TRAP, 
CLOCK 


INTERRUPT 
HANDLERS 


Figure  4.  Scheduling  Flow 


process  i n 


a.  If  room  exists  in  core  then   transfer   the 


b.  If  room  does  not  exist  in  core  then  sched 
looks  for  what  it  calls  "easv  core".  Easy  core  is  core  that 
belongs  to  a  process  that  is  not  a  system  process/  not 
locked  (eligible  for  transfer  out)/  and  waiting  for  some 
type  of  input  or  output.  If  easy  core  is  founds  then  that 
process  is  transferreo  out  and  sched  starts  over  by  looking 
for  the  process  that  has  been  in  the  swap  file  the  lonoest 
(this  will  be  the  same  process  that  it  found  before)/  and 
continues  with  "a"  abovp.  Sched  repeats  this  until  either 
the  process  can  be  transferred  in  or  no  easy  core  is 
available. 
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c.  If  no  easy  core  is  available  then  sched 
makes  two  more  checks  to  insure  that  the  process  is 
deserving  enough  to  reguire  another  process  to  be 
transferred  out. 

(1)  If  the  process  has  not  been  out  of 
core  for  more  than  two  seconds,  sched  goes  to  sleep  on  the 
RUN  IN  flag  which  indicates  there  is  at  least  one  process  on 
the  swap  file. 

(?)  Find  the  process  (it  must  be 
sleeping  or  ready  to  run,  but  not  running)  that  has  been  in 
core  the  longest.  If  that  process  has  not  been  in  core  at 
least  two  seconds  then  sched  goes  to  sleep  on  the  PUN  IN 
flag.  If  it  has  been  in  more  than  two  seconds,  transfer  it 
out  and  start  over  by  looking  for  the  process  that  has  been 
out  of  core  the  longest  and  continuing  with  "a"  above. 

2 .  swt  ch 

This  function  is  invoked  several  places 
throuahout  ML) NIX  to  accomplish  the  task  of  rescheduling  the 
CPUs.  Swtch  searches  the  process  table  for  the  highest 
priority  process  that  is  in  core  and  ready  to  run  on  the 
request  inq  CPU.  If  a  process  is  found  it  is  aiven  the  CPU, 
otherwise  the  CPU  is  out  in  an  idle  state.  It  stays  in  an 
i  d 1 p  state  until  started  again  by  an  interrupt. 
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3 .  si eep 

This  function  is  also  invoked  several  places 
throughout  M  U  N  I  X  .  It  will  change  the  process'  status  from 
ready  to  sleeping  or  waiting  depending  on  the  value  of  pri. 
If  pri  is  less  than  zero,  a  signal  cannot  disturb  the  sleep, 
and  the  status  is  changed  to  SSLEEP.  If  pri  is  not  less 
than  zero»  the  status  is  changed  to  S  W  A  I  T  ,  and  the  process 
may  be  disturbed  by  signals.  Chan  is  an  integer  that 
represents  the  reason  the  process  has  been  placed  in  a  wait 
state  (SrtAIT  or  SSLEEP).  After  the  process  has  been  put  in 
a  wait  state,  sleep  calls  swtch  to  find  another  process  to 
run  . 

1 .  wakeup 

This  function  chances  the  status  of  all 
processes  that  have  been  put  to  sleep  on  chan  from  the  wait 
state,  to  the  S R U N  state  (ready  to  run).  If  any  processes 
awakened  are  on  the  swao  file  and  sched  is  sleeping  on  the 
RUfviOUT  flag,  it  is  also  awakened.  When  swtch  is  next  called, 
sched  will  be  scheduled  to  run  (sched  is  the  highest 
priority  system  process),  anr)  an  attempt  will  be  made  to 
find  core  for  all  the  processes  just  awakened  by  wakpur. 
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APPENDIX  B:  SYSTEM  BENCHMARK 

A.  GENERAL 

This  appendix  contains  informal:  ion  concerning  the 
benchmark  proqram  used  for  testing  and  evaluating  scheduling 
changes  in  this  thesis. 

B.  INDIVIDUAL  PROCESSES 

limes  for  all  the  individual  processes  can  be 
found  in  Table  VIII.  The  times  used  in  Table  VIII  are  from 
the  "A"  system  (see  section  II. B .  2 ) .  It  is  significant  to 
note  that  the  times  on  the  "B"  system  are  faster  because 
there  is  approximately  three  times  as  much  "user  core" 
available.  The  benchmark  consists  of  a  series  of  eight 
processes  (discussed  below)  executed  from  a  command  file. 

1.  chdir  /usr/sys 

Change  the  current  workinq  directory  to  the 
new  one  specified/  in  this  case  the  new  workinq  directory  is 
/usr/sys.  This  command  does  not  create  a  new  process/  but 
is  directly  executed  in  the  shell. 

d,     sh  Id* 


SO 


Execute  the  command  file  Id.  File  Id  loads  a 
new  oDerating  system  and  places  it  in  a  file  named  "a.out", 

3  .  chdir  conf 

See  1  .  above. 

4.  CC  -C  conf  .cS. 

Compile  without  loading  the  C  program  named 
"conf .c"i  this  "program"  consists  of  aata  statements, 
initialization/  and  no  executable  code.  The  compiled  object 
code  goes  in  a  file  named  "conf.o". 

b.  chdir  /usr/bench 
See  1.  above. 

6.  cc  -0  rf  test  .c& 

Compile  the  C  program  "rftest.c"  using  the 
experimental  object-code  optimizer.  The  optimized  object- 
code  ooes  in  a  file  named  "a.out". 

7.  bas  tower  <towerin  >/dev/nullR 

This  is  a  compute  bound  process  that  has  an 
input  file  named  "towerin"  and  an  output  file  named 
"/dev/null".  The  output  file  is  a  null  device.  Tower  is  an 
interpretive  execution  of  a  recursive  solution  to  the  towers 
of  H ram  an  (Hanoi)  problem  which  represents  tokens  as  double 
precision  floating  point  numbers.  Solution  is  for  thirteen 
disks  ana  three  towers. 
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b.    chdir  /usr/sys/dmr 
See  1 .  above . 

9  .  CC  -C  -0  t  m  .  C& 

Compile  the  C  program  named  "tm.c"  without 
loading  it  and  with  the  experimental  object-code  optimizer. 
The  resulting  object-code  goes  in  a  file  named  "tm.o". 

10.-  cp  /  m  u  n  i  x  /dev/null& 

Copy  the  34, 800  byte  file  named  "/munix"  to 
the  file  named  "/dev/null". 

11.  chdir  /usr/sys 

See  1 .  above . 

12.  sum  /usr/sys/libl  >/dev/null& 

Compute  the  checksum  of  the  60,390  byte  file 
named  "/usr/sys/libl"  and  output  the  number  to  the  file 
named  "/dev/nul  1  "  . 

13.  sum  /usr/sys/lib2  >/dev/null& 

See  1 2 .  above . 

1  a.  wait 

Wait  until  all  processes  started  with  "&" 
have  completed,  and  report  any  abnormal  terminations.  There 
is  no  measurable  time  associated  with  this  command  if  it 
stands  alone. 
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All  the  times  used  in  Table  VIII  have  come  from  the 
time  command  of  UNIX  [18).  Fxecut ion  times  (user  and  system) 
are  determined  by  samDling  the  state  of  the  system  at  a  60 
hz  rate  (1/60  second).  A  counter  is  keot  for  each  type  of 
time.  Note  that  "  n  m  "  means  not  measurable.  It  is 
significant  to  note  that  the  execution  time  can  depenn  on 
what  kind  of  memory  the  process  happens  to  occupy.  The  user 
time  in  M 0 S  is  approximately  half  of  what  it  is  in  core 
[181.  This  problem  has  been  solved  by  running  the  benchmark 
program  as  a  single  user.  This  forces  the  same  processes 
into  the  same  type  of  core.  The  elapsed  time  (real)  is 
accurate  to  the  secona?  while  the  CPU  times  (user  and 
system)  are  measured  to  the  6  0th  of  a  second.  It  was  found 
that  the  system  timesmay  vary  by  as  much  as  8.5  per  cent; 
and  the  real  time  by  as  much  as  8.3  per  cent. 


process 

1 

a 

3 

?t\ 

5 

6 

7 

8 

9 

10 

1  1 

\d 

13 

14 

sum  ( m  i  n  ) 


rea  1 

nm 

40 

nm 

21 

nm 

37 

3  4 

nm 

41 

5 

nm 

fa 

5 

nm 

3:09 


user 

nm 

4.7 

nm 

1.0 

nm 

U.2 
30.8 

nm 
10.9 

0.0 

nm 

0.3 

0.3 

nm 

52,2 


sys 

nm 

3.9 

nm 

1  .9 

nm 

4.0 

0.7 

nm 

3.4 

0.6 

nm 

0.  7 

0.6 

nm 

IS. 8 


Table  VIII.  Individual  Process  Times 


C.  BENCHMARK  PROGRAM 

The   benchmark   proaram  consists    of    all    the 

processes  mentioned  in  section  B.  above  in  a  mu 1 t i p rog rammed 

mix.  Time  for  the  mult i programmed  benchmark  is: 


real  7 : 08 
user    46.6 
sys     18.0 

Table  IX.  benchmark  Program  Evaluation 


The   command   sequence   "time   sh  benchmark"  is  used  t 
initiate  processing  of  the  benchmark. 
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APPENDIX  C:  NON-ADAPTIVE  SCHEDULING  CHANGES 

A.  GENERAL 

This  appendix  contains  detailed  information 
concerning  the  non-adaptive  changes  made  to  the  ML) NIX  (based 
on  UNIX  Version  5)  scheduling  algorithm. 

B.  MAXIMUM  NUMBER  OF  PROCESSES  (NPROC) 

1 . Change 

NPROC  was  a  constant  defined  in  param.h  to  be 
fifty.  Ihat  means  there  can  be  no  more  than  fifty  processes 
in  the  system  at  any  one  time.  Twelve  system  functions  used 
NPROC  for  searching  the  process  table.  For  example;  if 
function  swtch  was  looking  for  the  hiahest  priority  process, 
it  looked  at  all  entries  in  the  process  table.  Normally  this 
would  not  be  considered  wasteful/  but  the  orocess  table  is 
very  seldom,  if  ever,  completely  full.  This  means  there  is 
time  being  wasted  if  the  process  table  does  not  hold  fifty 
processes.  Since  processes  are  entered  into  the  process 
table  at  the  first  available  space,  a  counter  could  be  use  id 
to   hold   the   maximum   number   of   processes  in  the  process 
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table.  Time  could  be  saved  by  searching  the  process  table 
from  the  beginning  to  the  counter.  This  change  was  made  as 
f ol  lows: 

1.  A  new  integer  variable*  nproc  (lower  case)/  was 

placed  in  proc.h  to  keep  track  of  the  last   process   in  the 

process  table.  Ihe  twelve  functions  that  used  NPROC  now  use 
nproc . 

d.  Two  lines  of  code  have  been  added  to  function 
newproc  (in  program  slp.c).  The  code  insures  that  nproc  is 
incremented  when  necessary. 

3.  Two  lines  of  code  have  been  added  to  function 
wait  (in  program  sysl.c)  to  insure  that  when  the  last 
process  in  the  process  table  terminates,  nproc  is 
decrementea.  It  is  not  sufficent  to  decrement  nproc  by  one 
in  all  cases.  Example:  The  process  table  could  have  the 
first  nine  process  blocks  allocated?  a  new  process  enters 
and  takes  block  ten,  process  eiaht  and  nine  terminate,  then 
process  ten  terminates.  If  nproc  was  decremented  by  one, 
then  all  searches  would  look  at  blocks  eight  and  nine 
unnec  essari  1  y . 

d .     Evaluation 

The   benchmark   program   (see  Appendix  b )  was 
run  aoainst  the  scheduling  a  1 o  o  r  i  t  h  m  before  and   after   this 
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change.  Four  runs  were  made*  two  with  a  drum  being  useo  for 
/TMP  files  (temp  files),  and  two  without.  The  results  are 
listed  in  tables  III  and  IV.  Real/  user,  and  system  times 
are  shown  in  minutes  and  seconds.  Appendix  B  explains  how 
the  system  calculates  these  times  and  estimates  their 
accuracy.  This  testing  was  accomolished  on  the  " 6 "  system 
(see  section  H.B.3.). 


BEFORE  CHANGES 


rea  I 
user 
sys 


2:  12 
:42 


AFTER  CHANGES 

real  6:00 
user  2:00 
sys     : 4 1 


Table  III.  NPROC  Change  Evaluation  with  No  Drum 


Before  Changes 

rea 1  4:49 

user  1  : b8 

sys  : 4  1 


After  Changes 

real  4:38 
user  1 : 48 
sys     : 42 


Table  IV.  NPROC  Change  Evaluation  with  Drum 
C.  LOOPING  IN  FUNCTION  SCHED 
1 .  Change 

As   described   in  section  R  .' 1  .  b  and  B.l.c.(?) 

of  Apprjngiy  A,  sched  unnecessarily  loons  to  a  point  that   is 
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repetitive.   The  two  loops  were  changed  as  follows: 

1.  A  label  r  " finds  p "  (find  so ace  for  process)/  was 
inserted  where  sched  starts  looking  for  core  for  the  process 
it  just  found  (the  process  that  has  been  out  of  core  the 
longest).  When  easy  core  has  been  found,  instead  of 
branching  back  to  look  for  the  process  that  has  been  out  of 
core  the  longest,  sched  branches  to  findsp. 

2 .  A  pointer,  "  p  e!  "  ,  was  added  to  the  declarations 
of  sched.  The  pointer  p  2  was  substituted  for  pi  in  the  first 
two  instances  after  no  easy  core  is  found.  This  leaves  pi 
pointing  to  the  process  that  has  been  out  of  core  the 
longest,  giving  no  neeo  to  search  for  that  process  again. 

2 .    Evaluation 

The  benchmark  program  (see  appendix  B)  was 
run  before  and  after  the  changes.  Several  runs  were  made 
because  the  statistics  showed  no  significant  savings  (see 
Table  V).  Testing  was  accomplished  on  the  "A"  system. 


be  fore  C  hanges 

real  7:08 

user  : U6.  b 

s  y  s  :  1  6  .  0 


After  Changes 

real   7 : OR 
user    : 4b .b 
sys     : 1 7.8 


Table  v.  Looping  Change  Evaluation 
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D.  SIZE  CHECK 

1  .  Change 

In  two  separate  places,  function  sched 
swapped  a  process  out  of  core  without  giving  any 
consideration  as  to  whether  that  action  created  enough  room 
for  the  incoming  process.  Many  times  two  or  three  processes 
were  swapped  out  of  core  when  onlv  one  was  necessary.  This 
creates  a  large  overhead  in  swapping.  A  two  pass  check  was 
installed  to  circumvent  this  problem. 

1.  First  pass  -  Check  to  see  that  the  process 
being  swapped  out  of  core  is  as  laroe  or  larger  than  the  one 
being  swapped  in,  if  not,  do  not  swap  it  out. 

id .  Second  pass  -  If  the  first  pass  fails  to  create 
enough  room  for  the  incoming  process,  swap  elidible 
processes  out  until  enough  room  exits. 

This  change  was  implemented  using  a  first/second  pass 
indicator  (fspass)  having  the  value  of  zero  for  the  first 
pass  and  one  for  the  second  pass.  This  indicator  was  "or" ed 
with  the  si/e  check  there  by  using  the  same  code  for  both 
passes  . 


V) 


2 ,  Evaluation 

Several  runs  were  made  with  the  benchmark 
program  on  the  "  A  "  system  because  of  the  26  oercent  savings 
realized  in  real  time  (see  Table  VI).  This  change  was  also 
tested  on  the  "B"  system,  but  a  savings  of  only  6  percent 
was  found  there.  The  difference  in  savings  is  explained  by 
the  significant  difference  in  available  memory  for  each 
system. 


BEFORE  CHANGES 

real  7:0b 
user   :  46 .  6 
sys    :  18.0 


AFTER  CHANGES 

real   5:18 
user    : a  5 . 3 
sys     :  1  7.9 


Table  VI.  Size  Check  Chan a e  Evaluation 
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APPENDIX  0:  PRIORITY  CALCULATION  SUBROUTINE 


/* 


*  This  subroutine  was  written  by  Ronald  E.  Joy  and 

*  used  for  an  adaptive  scheduler  in  November  of  1975 
*/ 


include  "../param.h" 
include  ".,/proc.h" 


/* 


*  If  the  following  variables  are  needed  in  any 

*  other  program,  include  schori.h 
*/ 


i  nt  t  chg 1 

i  n t  s 1  on  1 

i  n  t  s  1  o  p  2 

i  nt  C  hgt 1 


i  nt  bapr i 
int'minpri 

i  n  t  ma  xo  r  i 
i  n t  mt  pr  i 
i  n t  mx  t  i  me 
int  maxres 


4 /     //  slope  1  changes  at  this 

//  time  (seconds). 
2,"     //  number  of  bits  to  shift 

//  left  =  *«  ( slope  1 ) , 
0;     //  number  of  bits  to  shift 
//  left  =  *1  (slope  2) . 
16;     //  if  tchgtl  or  slool  are 
//  changed,  chgtl  must  be 
//  c  h  anged  also. 
//  chot  1  =  t chg  1  <<  s 1 onl 
300;      //  hack  ground  priority 
-300;     //  lowest  priority  (value) 
30  0;     //  max  priority  (value) 
25  0;      //  max  time  priority 
bUO'r  //    max  time  =  9  rr  i  n  (sec.) 

7200?      //  max  resource  units,  this 
//  value  is  =  2  minutes. 


* 
* 

* 
* 
* 
* 
ft/ 


The  following  code  is  used  to  set  a  users  priority 
between  -300  and  30  0.  A  value  nf  0  is  the  least 
critical  priori  fv(  with  -300  beinq  the  most 
critical.  Any  value  over  0  is  non-critical.  A 
process  is  critical  if  it  has  not  received  as 
many  resource  units  as  nictated  by  trie  policy 
f  unc  t  i  on . 
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schpri(nrp)  / /  schedule  priority 

struct  proc  *  n  r  p  ;    //  pointer  to  process  that  needs 

//  a  priority  calculated. 

{ 

register  struct  proc  *rp;    //  process  pointer 
register  int  pri;    //  calculated  priority 
register  int  resrl   //  resource  units  received 
int  time?    //  current  time  of  this  process 


rp  =  n 

time  = 

res  r  = 

if  (  rp 

//  i  f 

//  pro 

pr 

else  { 

i  f 

// 

// 

// 


el 


rp; 

rp->p<-  time; 

rr>->o«-  res  r  ; 
->p«-f  lag&PSTM  !!  rp->p«-f  1  ag&TPWAIT  ) 

this  process  is  already  a  back  ground 
cess  or  it  is  waiting  for  terminal  I/O 
i  =  bapri;      //  priority  =  back  ground 

(resr  <  0)    //  has  the  resource  count 
gone  over  32767.  there  is  no  chance 
of  this  happenina  with  mxtime  set  to 
its  current  value. 
{rp->p<-flag  =  !  PSTM;  pri  =  bgpri;} 
//  set  PSTM  flag  and  priority  to  bg 
se  { 

if  (time  >=  mxtime)  / /  if  process  has 
//  been  alive  lonoer  than  mxtime 

if  (resr  >  maxres)   //  if  resource 
//  count  greater  than  maxres 

{  r  p  -  >  p «-  f  1  a  g  =  !  PSTM;  pri  =  b  g  p  r  i  ;  } 
//  set  PSTM  and  pri  to  bg 
else 

pri  =  m  t  p  r  i  ; 


//  max  time  priority 


else  { 
i  f 
// 


//  if  process 
less  than  tchol 


( t  i  me  <  tchql  ) 
has  been  alive 
pri  =  resr  -  (time  <<  slopl); 
else 

pri  =  resr  -  chotl  - 
((time  -tchol)  <<  si  on?); 
if  ( p p i  >  maxpri ) 
/ /  is  too  1  a  roe 

pri  =  maxpri; 
else 

if  (pri  <  m i  n  p r  i  ) 
pri  =  m  i  n  p  r  i  ; 
} 


// 
// 


if  priority 

fix  it 

/  /  too  s  m  a  1 
//  fix  it 


} 
return  (pri) 


//  return  priority  to  c.il  1  er 
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